Validation of average error rate over classifiers
نویسنده
چکیده
We examine methods to estimate the average and variance of test error rates over a set of classi ers We begin with the process of drawing a classi er at random for each example Given validation data the average test error rate can be estimated as if validating a single classi er Given the test example inputs the variance can be computed exactly Next we consider the process of drawing a classi er at random and using it on all examples Once again the expected test error rate can be validated as if validating a single classi er However the variance must be estimated by validating all classifers which yields loose or uncertain bounds
منابع مشابه
Ensemble Validation: Selectivity has a Price, but Variety is Free
If classifiers are selected from a hypothesis class to form an ensemble, bounds on average error rate over the selected classifiers include a component for selectivity, which grows as the fraction of hypothesis classifiers selected for the ensemble shrinks, and a component for variety, which grows with the size of the hypothesis class or in-sample data set. We show that the component for select...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملar X iv : 1 61 0 . 01 23 4 v 1 [ st at . M L ] 4 O ct 2 01 6 Ensemble Validation : Selectivity has a Price , but Variety is Free
If classifiers are selected from a hypothesis class to form an ensemble, bounds on average error rate over the selected classifiers include a component for selectivity, which grows as the fraction of hypothesis classifiers selected for the ensemble shrinks, and a component for variety, which grows with the size of the hypothesis class or in-sample data set. We show that the component for select...
متن کاملOptimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction
BACKGROUND In biometric practice, researchers often apply a large number of different methods in a "trial-and-error" strategy to get as much as possible out of their data and, due to publication pressure or pressure from the consulting customer, present only the most favorable results. This strategy may induce a substantial optimistic bias in prediction error estimation, which is quantitatively...
متن کاملSVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability using Machine Learning Ensembles
Improving accuracy of supervised classification algorithms in biomedical applications, especially CADx, is one of active area of research. This paper proposes construction of rotation forest (RF) ensemble using 20 learners over two clinical datasets namely lymphography and backache. We propose a new feature selection strategy based on support vector machines optimized by particle swarm optimiza...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 19 شماره
صفحات -
تاریخ انتشار 1998